Skip to content

[not merge] xpu glm test#7748

Open
zhupengyang wants to merge 6 commits into
PaddlePaddle:developfrom
zhupengyang:glm_docker_1_merge
Open

[not merge] xpu glm test#7748
zhupengyang wants to merge 6 commits into
PaddlePaddle:developfrom
zhupengyang:glm_docker_1_merge

Conversation

@zhupengyang
Copy link
Copy Markdown
Collaborator

Motivation

💡 If this PR is a Cherry Pick, the PR title needs to follow the format by adding the [Cherry-Pick] label at the very beginning and appending the original PR ID at the end. For example, [Cherry-Pick][CI] Add check trigger and logic(#5191)

💡 如若此PR是Cherry Pick,PR标题需遵循格式,在最开始加上[Cherry-Pick]标签,以及最后面加上原PR ID,例如[Cherry-Pick][CI] Add check trigger and logic(#5191)

Modifications

Usage or Command

Accuracy Tests

Checklist

  • Add at least a tag in the PR title.
    • Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
    • You can add new tags based on the PR content, but the semantics must be clear.
  • Format your code, run pre-commit before commit.
  • Add unit tests. Please write the reason in this PR if no unit tests.
  • Provide accuracy results.
  • If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

@paddle-bot
Copy link
Copy Markdown

paddle-bot Bot commented May 8, 2026

Thanks for your contribution!

PaddlePaddle-bot

This comment was marked as outdated.

@PaddlePaddle-bot
Copy link
Copy Markdown

PaddlePaddle-bot commented May 8, 2026

🤖 Paddle-CI-Agent | ci_status_monitor | 2026-05-14 00:48:15

CI报告基于以下代码生成(30分钟更新一次):


1 任务总览

所有 Required 任务均已通过(本 PR 无 Required 任务),有 1 个可选任务失败(不阻塞合并)。

总执行(rerun次数) 总任务 ✅ 通过 ❌ 失败 ⏳ 运行中 ⏸️ 等待中 跳过
2(0) 2 1 1 0 0 0

2 任务状态汇总

2.1 Required任务 : 0/0 通过

本 PR 无 Required 任务(分支保护规则未配置必选 CI)。

2.2 可选任务 — 1/2 通过

可选任务不阻塞合并,失败仅供参考。

状态 任务 耗时 日志 重跑
Trigger Jenkins for PR 1m1s Job -
其余 1 个可选任务通过 - - -

3 失败详情(仅 required)

无 required 失败任务。

PaddlePaddle-bot

This comment was marked as outdated.

Copy link
Copy Markdown

@PaddlePaddle-bot PaddlePaddle-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Paddle-CI-Agent | pr_review | 2026-05-14 00:44:40

📋 Review 摘要

PR 概述:修复 XPU 昆仑芯 GLM 推理相关问题,同时优化 KVCache 写入/预取健壮性、修复 local_scheduler bug、新增 splitwise interrupt 命令支持

变更范围cache_manager/scheduler/local_scheduler.pymodel_executor/layers/sample/sampler.pyworker/xpu_model_runner.pysplitwise/

影响面 Tag[KVCache] [Scheduler] [XPU] [PD Disaggregation] [OP]

📝 PR 规范检查

标题 [not merge] xpu glm test 不含任何官方 Tag;PR 描述所有段落均为空占位符,不合规。

标题建议(可直接复制):

  • [BugFix][XPU] Fix XPU sampling params, local scheduler recycle bug, and KVCache storage robustness

PR 描述建议(可直接复制,必须复刻 checklist §D2 模板的完整结构):

## Motivation
修复 XPU(昆仑芯)上 GLM 模型推理时遇到的若干问题,包括采样参数越界、local scheduler 回收请求时的 IndexError/cursor 错误、KVCache storage 写入超时与预取失败的健壮性,以及新增 splitwise interrupt_requests 控制指令支持。

## Modifications
- `fastdeploy/model_executor/layers/sample/sampler.py`:XPU 平台使用 32-bit MAX_INFER_SEED(2147483646);调整 decoder offset 乘数为 32
- `fastdeploy/scheduler/local_scheduler.py`:修复 `_recycle``ids.index` 可能抛出 ValueError、cursor 无条件递减的 bug;修复过期 ID 批量移除使用错误 index 的 bug
- `fastdeploy/cache_manager/prefix_cache_manager.py`:GPU blocks 不足时改为 warning + 跳过 storage 预取;prefetch 路径增加 try/except 防护;storage 写入 token_ids 截断至实际块大小
- `fastdeploy/cache_manager/cache_transfer_manager.py`:将 `flush_token_index` 调用从 `write_back_storage_task` finally 块移至 `_run_write_back_storage` 开头
- `fastdeploy/cache_manager/transfer_factory/mooncake_store/attention_store.py`:将批量写入改为分片(slice)写入,支持总超时与分片超时控制
- `fastdeploy/worker/xpu_model_runner.py``ids_remove_padding` 为空时提前返回 None
- `fastdeploy/splitwise/internal_adapter_utils.py`:新增 `interrupt_requests` 控制命令处理

## Usage or Command
N/A

## Accuracy Tests
N/A

## Checklist

- [ ] Add at least a tag in the PR title.
  - Tag list: [`[FDConfig]`,`[APIServer]`,`[Engine]`, `[Scheduler]`, `[PD Disaggregation]`, `[Executor]`, `[Graph Optimization]`, `[Speculative Decoding]`, `[RL]`, `[Models]`, `[Quantization]`, `[Loader]`, `[OP]`, `[KVCache]`, `[DataProcessor]`, `[BugFix]`, `[Docs]`, `[CI]`, `[Optimization]`, `[Feature]`, `[Benchmark]`, `[Others]`, `[XPU]`, `[HPU]`, `[GCU]`, `[DCU]`, `[Iluvatar]`, `[Metax]`]
  - You can add new tags based on the PR content, but the semantics must be clear.
- [ ] Format your code, run `pre-commit` before commit.
- [ ] Add unit tests. Please write the reason in this PR if no unit tests.
- [ ] Provide accuracy results.
- [ ] If the current PR is submitting to the `release` branch, make sure the PR has been submitted to the `develop` branch, then cherry-pick it to the `release` branch with the `[Cherry-Pick]` PR tag.

问题

级别 文件 概述
🔴 Bug fastdeploy/model_executor/layers/sample/sampler.py:103 local_pos * 32 未加 XPU 平台守卫,影响所有硬件采样行为
🔴 Bug fastdeploy/cache_manager/prefix_cache_manager.py:1143 删除 enable_output_caching 条件判断,output tokens 无条件写入 storage cache
🟡 建议 fastdeploy/scheduler/local_scheduler.py:157 A4 多实现同步:_recycle bug 修复是否已同步到 global/dp/splitwise scheduler
❓ 疑问 fastdeploy/cache_manager/cache_transfer_manager.py:934 elif attention_store 分支内重复判断 storage_backend_type == "attention_store" 冗余

总体评价

PR 涉及多个模块的重要修复,整体方向正确。但两处 P0 问题(XPU 采样 offset 未加平台守卫影响全平台、output caching 行为破坏性变更)需在合入前明确处理。鉴于标题标注 [not merge],建议在完善上述问题后再提正式 PR。

offsets = paddle.where(
is_decoder,
local_pos * 4,
local_pos * 32,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Bug local_pos * 32 未加 XPU 平台守卫,对所有硬件生效。

同一函数中 MAX_INFER_SEED 的修改已正确使用 if current_platform.is_xpu() 守卫,但此处 offset 乘数从 4 改为 32 没有任何平台判断,会改变 GPU / DCU 等硬件上的采样随机性行为。

建议修复:

if current_platform.is_xpu():
    offsets = paddle.where(
        is_decoder,
        local_pos * 32,
        paddle.zeros_like(local_pos),
    )
else:
    offsets = paddle.where(
        is_decoder,
        local_pos * 4,
        paddle.zeros_like(local_pos),
    )

如果 32 是全平台的正确值(4 是历史 bug),请在 PR 描述中说明原因。

input_token_ids = token_ids + request.output_token_ids
else:
input_token_ids = token_ids
input_token_ids = token_ids + request.output_token_ids
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Bug 删除 enable_output_caching 条件判断后,output tokens 会无条件写入 storage cache,破坏不启用 output caching 用户的预期行为。

原代码含义:只有在 enable_output_caching=True 时才把 output token ids 附加到存储 key 计算中;设为 False 时仅用 input tokens 作为存储键。删除该判断后,所有用户都会把 output block 写入 storage,可能导致:

  1. 预取时(只有 input tokens)永远无法命中这些带 output 后缀的存储块,造成无效写入
  2. 对不希望缓存 output 的场景(安全/隐私需求)产生行为破坏

建议恢复原有条件判断,或在 PR 中明确说明此变更的意图。

expired_ids.append(request.request_id)

for i, expired_id in enumerate(expired_ids):
for expired_id in expired_ids:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 建议 A4 多实现同步检查:_recycle 中修复了以下 bug:

  1. ids.index(request_id) 可能抛出 ValueError(现已捕获)
  2. ids_read_cursor 无条件递减(现已改为 if idx < self.ids_read_cursor
  3. self.ids.pop(i) 使用枚举索引而非实际位置(现已改为 self.ids = self.ids[len(expired_ids):]

请确认 global_scheduler.py / dp_scheduler.py / splitwise_scheduler.py 中是否存在相同模式,必要时同步修复,避免其他 scheduler 实现出现相同问题。


elif self.storage_backend_type == "attention_store":
try:
if (self.rank == 0) and self.storage_backend_type == "attention_store":
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❓ 疑问if 条件中的 self.storage_backend_type == "attention_store" 是冗余判断——代码已处于外层 elif self.storage_backend_type == "attention_store": 分支内,该条件必然为真。

建议简化为:

if self.rank == 0:
    self.storage_backend.flush_token_index(task_id, token_ids, 0, False)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants